Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

نویسندگان

  • George GATUHA
  • Tao JIANG
چکیده

Association rule data mining is an important technique for finding important relationships in large datasets. Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining algorithms. Experimental results have demonstrated the efficiency of the two data structures in frequent itemsets mining. This work proposes FDM, a new algorithm based on FP-tree and DIFFset data structures for efficiently discovering frequent patterns in data. FDM can adapt its characteristics to efficiently mine long and short patterns from both dense and sparse datasets. Several optimization techniques are also outlined to increase the efficiency of FDM. An evaluation of FDM against three frequent itemset data mining algorithms, dEclat, FP-growth, and FDM* (FDM without optimization), was performed using datasets having both long and short frequent patterns. The experimental results show significant improvement in performance compared to the FP-growth, dEclat, and FDM* algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of Frequent Itemsets: Frequent Item Tree-Based Approach

Mining frequent patterns in large transactional databases is a highly researched area in the field of data mining. Existing frequent pattern discovering algorithms suffer from many problems regarding the high memory dependency when mining large amount of data, computational and I/O cost. Additionally, the recursive mining process to mine these structures is also too voracious in memory resource...

متن کامل

Discovering of Frequent Itemsets with CP-mine Algorithm

Efficient algorithms to discover frequent patterns are crucial in data mining research. Several effective data structures, such as two-dimensional arrays, graphs, trees, and tries have been proposed to collect candidate and frequent itemsets. It seems as the tree structure is most extractive to storing itemsets. The outstanding tree has been proposed so far is called FP-tree which is a prefix-t...

متن کامل

A Novel Data Mining Method to Find the Frequent Patterns from Predefined Itemsets in Huge Dataset Using TMPIFPMM

Abstract-Association rule mining is one of the important data mining techniques. It finds correlations among attributes in huge dataset. Those correlations are used to improve the strategy of the future business. The core process of association rule mining is to find the frequent patterns (itemsets) in huge dataset. Countless algorithms are available in the literature to find the frequent items...

متن کامل

An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures1

In this paper, the issue of mining and maintaining association rules in a large database of customer transactions is studied. The maintenance of association rules can be mapped into the problem of maintaining frequent itemsets in the database. Because the mining of association rules is time-consuming, we need an efficient approach to maintain the frequent itemsets when the database is updated. ...

متن کامل

The Frequent Pattern List: Another Framework for Mining Frequent Patterns

The mining of frequent patterns (or frequent itemsets) plays an essential role in many tasks of data mining. One major methodology for mining frequent patterns is the Apriori-based approach, which is computationally costly because many candidate itemsets have to be generated and verified. More recently, another approach using the Frequent-Pattern Tree (FP-tree) have been suggested to avoid the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017